Biostatistics For Dummies (Monika Wahi John Pezzullo)

skater scored an average of 9.0 compared to another who scored an average of 5.0. You will not know

what the skate routines looked like unless you watch them, but the score will already tell you that if

you were to watch them, you would expect to see that the one that scored 9.0 was executed in a more

visually pleasing way than the one that scored 5.0.

Frequency distributions have names for their important characteristics, including:

Center: Where along the distribution of the values do the numbers tend to center?

Dispersion: How much do these numbers spread out?

Symmetry: If you were to draw a vertical line down the middle of the distribution, does the

distribution shape appear as if the vertical line is a mirror, reflecting an identical shape on both

sides? Or do the sides look noticeably different — and if so, how?

Shape: Is the top of the distribution nicely rounded, or pointier, or flatter?

Like using average skating scores to describe the visual appeal of an Olympic skate routine, to

describe a distribution you need to calculate and report numbers that measure each of these four

characteristics. These characteristics are what we mean by summary statistics for numerical variables.

Locating the center of your data

When you start exploring a set of numbers, an important first step is to determine what value they tend

to center around. This characteristic is called, intuitively enough, central tendency. Many statistical

textbooks describe three measures of central tendency: mean (which is the same as average), median,

and mode. You may assume these are the three optimal measures to describe a distribution (because

they all begin with m and are easy to remember). But all three have limitations, especially when

dealing with data obtained from samples in human research, as described in the following sections.

Arithmetic mean

The arithmetic mean, also commonly called the mean (or the average), is the most familiar and most

often quoted measure of central tendency. Throughout this book, whenever we use the two-word term

the mean, we’re referring to the arithmetic mean. (There are several other kinds of means besides the

arithmetic mean, which we describe later in this chapter.)

The mean of a sample is often denoted by the symbol m or by placing a horizontal bar over the

name of the variable, like

. The mean is obtained by adding up the values and dividing by the

sample size — meaning how many there are. (If you are using software for this, make sure

missing values are excluded, or the equation will not compute.) Here’s a small sample of

numbers — the diastolic blood pressure (DBP) values of seven study participants (in mmHg)

arranged in increasing numerical order: 84, 84, 89, 91, 110, 114, and 116. For the DBP sample:

You can write the general formula for the arithmetic mean of N number of values contained in the

variable X in several ways: